Genome-wide discovery of di-nucleotide SSR markers based on whole genome re-sequencing data of Cicer arietinum L. and Cicer reticulatum Ladiz

Simple sequence repeats (SSRs) are valuable genetic markers due to their co-dominant inheritance, multi-allelic and reproducible nature. They have been largely used for exploiting genetic architecture of plant germplasms, phylogenetic analysis, and mapping studies. Among the SSRs, di-nucleotide repeats are the most frequent of the simple repeats distributed throughout the plant genomes. In present study, we aimed to discover and develop di-nucleotide SSR markers by using the whole genome re-sequencing (WGRS) data from Cicer arietinum L. and C. reticulatum Ladiz. A total of 35,329 InDels were obtained in C. arietinum, whereas 44,331 InDels in C. reticulatum. 3387 InDels with 2 bp length were detected in C. arietinum, there were 4704 in C. reticulatum. Among 8091 InDels, 58 di-nucleotide regions that were polymorphic between two species were selected and used for validation. We tested primers for evaluation of genetic diversity in 30 chickpea genotypes including C. arietinum, C. reticulatum, C. echinospermum P.H. Davis, C. anatolicum Alef., C. canariense A. Santos & G.P. Lewis, C. microphyllum Benth., C. multijugum Maesen, C. oxyodon Boiss. & Hohen. and C. songaricum Steph ex DC. A total of 244 alleles were obtained for 58 SSR markers giving an average of 2.36 alleles per locus. The observed heterozygosity was 0.08 while the expected heterozygosity was 0.345. Polymorphism information content was found to be 0.73 across all loci. Phylogenetic tree and principal coordinate analysis clearly divided the accessions into four groups. The SSR markers were also evaluated in 30 genotypes of a RIL population obtained from an interspecific cross between C. arietinum and C. reticulatum. Chi-square (χ2) test revealed an expected 1:1 segregation ratio in the population. These results demonstrated the success of SSR identification and marker development for chickpea with the use of WGRS data. The newly developed 58 SSR markers are expected to be useful for chickpea breeders.


Results
Genotyping. A total of 2.01 GB and 2.16 GB raw sequence reads of C. arietinum and C. reticulatum were generated from 150 bp paired-end sequencing. C. arietinum had 34.77 M reads and 33% guanine-cytosine (GC) content while C. reticulatum had 33.60 M reads and 34% GC content. The means of reads mapped to the C. arietinum reference genome were 97.56% and 96.62% in C. arietinum and C. reticulatum, respectively. Variant detection. Using variant calling pipeline, 3.9 M and 4.7 M variants were initially detected in C. arietinum and C. reticulatum genome, respectively. Out of all variants, a total of 3.26 M SNPs were identified in C. arietinum, by contrast 3.93 M in C. reticulatum compared to the reference genome. In total, 35,329 and 44,331 InDels were identified in the species of C. arietinum and C. reticulatum, respectively. A total of 3387 InDels with 2 bp length was detected in C. arietinum, there was 4704 in C. reticulatum. Among 8091 InDels, 58 di-nucleotide regions that were polymorphic between two species were selected and used for primer design (Table 1).

SSR validation in RIL population.
Designed primer pairs were used for validation in 30 chickpea genotypes of F 6 population obtained from an interspecific cross between C. arietinum and C. reticulatum. Out of SSR31 and SSR32, all primers were successfully amplified. The obtained PCR products were loaded on a polyacrylamide gel, and allele sizes were determined by comparing with C. arietinum and C. reticulatum. The difference of allele sizes was also confirmed in the gel. It was seen that all 30 genotypes carried one of the alleles which the parents had. While SSR5 and SSR10 produced suitable alleles in 30 RIL genotypes for 2-nucleotide polymorphism between female and male parents, SSR14 primer produced suitable alleles for 8-nucleotide polymorphism and SSR18 primer for 6-nucleotide polymorphism between C. arietinum and C. reticulatum (Table 1).
Chi-square (χ 2 ) values were calculated for each marker to test the fit of the markers in 30 genotypes representing the RIL population to the expected 1:1 expression ratio. Markers deviating from expected Mendelian ratios were determined by chi-square analysis ( Table 2). According to the results, it was determined that the markers were suitable for 1:1 expansion ratio, since the calculated p values for all markers except SSR20 were greater than 0.05.

SSR diversity in cultivated and wild populations.
For genetic diversity analysis, 30 genotypes obtained from cultivated and wild species were tested in polyacrylamide gel, bands were scored according to allele sizes. As a result of the analysis, a total of 244 alleles belonging to 41 different SSR loci were determined in 30 chickpea genotypes (Table 3). At the population level, allelic diversity in cultivated and wild populations was shown in Fig. 1. Total allele distribution was 63 in cultivars and 311 in wild genotypes. While a total of 110 alleles were determined in the genotypes of the C. reticulatum, 112 alleles were observed in the genotypes of the C. echinospermum. 89 alleles were determined in the population from distantly related wild species. The mean number of alleles (Na) for 30 genotypes was 2.36 ( Table 3). The highest number of alleles was obtained from the primers SSR3, SSR58 and SSR39 (Table 3). The number of effective alleles (Ne) varied between 0.75 and 3.74. SSR1 19 ,011,134-19,011,210  CTT CCA CGC GAG AGA  AAA AC   TGG CCA ATT TGA AAA  GAA AA  CT  176  180  182   SSR2  55,753,401-55,753,477  TTG CCC TGA TTT GAG  ATG TG   TTG GAA ATT CAA CCT ACA  CAA AAA  TA  158  160  160   SSR3  19,011,133-19,011,211  CTT CCA CGC GAG AGA  AAA AC   TGG CCA ATT TGA AAA  GAA AA  CT  176  180  182   SSR4  889,577-889,653  TGC CAG TTT TTA ACA GCA  TGA   CAG CAT TAT CTG CAA AAA  CAAA  AT  164  154  164   SSR5  994,011-994,087  TCC TTG TTT TAA TTC CTC  CATTG   TGA GAC TCG ACG CAT TTA  AGAA  TA  164  162  164   SSR6  1,322,967-1,323,044  TTC ATG ATG AGT GAA TGG  ATGAG   AAA TGG TGC ACG TGT  TTG TT  AT  167  157 (Table 3). Phylogenetic tree consisting of 30 chickpea genotypes was constructed based on the UPGMA clustering method with newly developed SSRs (Fig. 2). The chickpea genotypes were divided into four clusters, indicating clear separation between wild and cultivated species. Cluster I contained cultivated chickpeas including four kabuli and four desi chickpeas. Cluster II, III and IV consist of wild chickpea species, each representing C. echinospermum, C. reticulatum and other wild chickpea species, respectively.
The PCoA analysis confirmed the clusters of the phylogenetic tree (Fig. 3). Cultivated and wild genotypes did not cluster together. The two informative components explained 92.36% of the cumulative variance, PC1 and PC2 shared 53.72% and 38.64% variation, respectively.

Discussion
Using NGS technology is an effective tool for the identification of SSR markers. SSRs are valuable genetic markers due to their co-dominant inheritance, multi-allelic and reproducible nature 55 . In chickpea, large numbers of SSR markers have been identified and widely used for genetic diversity analysis, gene/QTL mapping, construction of linkage map, marker assisted selection (MAS) 33,56-59 . However, validation and selection of informative markers from such huge numbers of markers that show polymorphism in chickpea, is an excessive effort. In addition, the narrow genetic base in chickpea may can restrict use of the identified markers in genotyping studies because of their low intra-specific polymorphism among chickpea genotypes 23,30 . The NGS technologies have caused impressive advances in sequencing which creates high-throughput sequences to transform genotyping and plant breeding. It provides opportunities to perform high-throughput SSR identification. In present study, we developed genome-wide SSR markers from cultivated and wild chickpea genotypes. SSR marker development from genomic data has been reported for various crops such as sesame 60 , red clover 61 , peanut 62 , sweet potato 63 , faba bean 64 , lentil 65 .  Validation and polymorphic potential of SSRs. The utilization of genetic diversity in chickpea genetic resources is very important in order to utilize collections and improve breeding studies. Genetic diversity analysis in chickpea was previously performed using RAPD 18 , AFLP 68 , STMS 69 , SSRs 70,71 . In this study, the effectiveness of the developed markers was evaluated in 30 chickpea genotypes obtained from cultivated and wild species as well as 30 chickpea genotypes of F 6 population obtained from an interspecific cross between C. arietinum and C. reticulatum. The markers were effective for detection of a total of 244 alleles (Na). The mean of number of alleles (2.36) observed in this study are within the ranges revealed by various previous studies. For instance, the use of 33 SSR markers identifed a total of 111 alleles with an average of 3.7 alleles per locus in 155 chickpea genotypes 72 .   www.nature.com/scientificreports/ and the wild species. It is also clear as the wild progenitor, Cicer reticulatum showed close proximity with the cultivated chickpea. The other close connection was seen between C. reticultum and C. echinospermum. It can be supposed from this study that cluster analysis shows the effectiveness of the designed markers.

Distribution of variants in
The results of the present study revealed the success of SSR identification and marker development in chickpea using NGS genome data. The developed SSR markers were applied successfully for illuminating genetic diversity among cultivated and wild chickpea populations as well as validation in F 6 population obtained from an interspecific cross between C. arietinum and C. reticulatum. Therefore, newly developed 58 SSR markers are potentially useful for genetic studies of chickpea.
In conclusion, NGS strategy led to the discovery of a large number of microsatellites markers, providing thousands of SSRs for validation in chickpea. These new SSRs will become significant molecular tools for chickpea genetic breeding programs. Later, these markers could be integrated in genetic maps to be utilized in MAS.

Materials and methods
Plant material. C. arietinum L., CA 2969 and C. reticulatum Ladiz., AWC 602 were used as a genetic material for WGRS analysis. CA 2969 and AWC 602 chickpea genotypes were registered by USDA-ARS and Akdeniz University, Department of Field Crops, respectively. The important traits for these genotypes were given in Table 4. Developed SSRs were validated in 30 chickpea lines from a RIL population earlier developed by Sari et al. 83 and derived from an interspecific cross between CA 2969 and AWC 602. The markers were also  Table 4. Important morphological and the specific-known traits of the parents used for WGRS analysis (*Chrigui et al. 15 ).  www.nature.com/scientificreports/ reads at Macrogen Inc., (Macrogen, Seoul, Korea). WGRS data of two available genotypes were deposited into the National Center for Biotechnology Information (NCBI) Sequence-Read Archive (SRA) database. The raw data were demultiplexed using Je V1.2 85 , a quality control was performed for FASTQ Sanger files using fastp 86 , and reads with a Phred quality score below 15 were trimmed 87 . The cleaned data were aligned with kabuli reference genome 1.0 1 using Bowtie 2 with default parameters 88 in the Galaxy software (www. usega laxy. org). The created BAM files (*.bam) were analyzed using Freebayes (Galaxy Version 1.1.0.46-0) 89 , with simple diploid calling and filtering, and a minimum of 20 × coverage for variant detection. The obtained variant files were filtered using VCFfilter (Galaxy Version 1.0.0) and SNPs were chosen. Insertions and deletions from individual (*.vcf) files were later merged into a single VCF file using VCF genotypes (Galaxy Version 1.0.0).

Traits
The combined variant file was processed using Microsoft Excel to eliminate duplicated regions and organize the SSRs according to their sizes. SSR regions which have 2 bp long and polymorphic between parents were checked using the Integrated Genome Browser V9. 1.4. Primer design. For designing the primer pairs from the flanking sequences of identified SSRs, Primer3 software 90,91 was used with the parameters as follows: primer length of 18-27 nucleotides, melting temperatures of 55-65 °C, GC content of 30-70%, and predicted PCR products of 100-300 bp in length. The primer pairs were later controlled for possible duplication of sequences in the genome using IGB software.
The PCR reactions were performed using the M13 tailing PCR procedure 92 . The forward primers were tailed by adding an M13 sequence labeled with IRDye to the 5′ end. The following PCR protocol was applied: 95 °C initial denaturation for 5 min, 30 cycles at 95 °C for 30 s, annealing temperature 60 °C for 30 s, 72 °C for 1 min, followed by 9 cycles of 95 °C for 30 s, 55 °C for 30 s, 72 °C for 1 min, and then a final extension of 10 min at 72 °C. PCR products were loaded onto 8% denatured polyacrylamide gel and separated by 4300 DNA analyzer (LI-COR, Inc., Lincoln, Nebraska, USA). 1 kb size marker was used to score markers as 1 or 0 for the presence and absence of alleles.
Statistical analyses. RIL data was analyzed using MINITAB 19 software. A Chi square (χ 2 ) test was used to assess goodness of fit to the observed segregation ratios followed 3:1 ratio in the RIL population.
Genetic diversity and phylogeny analysis. Genetic diversity parameters such as number of alleles (Na), number of effective alleles (Ne), Shannon diversity index (I), expected heterozygosity (He), unexpected heterozygosity (uHe), observed heterozygosity (Ho) and Wright's fixation index (F) were shown using GenAlEx 6.5 93 . The phylogenetic tree was constructed in DARwin ver 5.0 software 94 using the unweighted pair group method with arithmetic mean (UPGMA) 95 clustering method and modified in FigTree v1.4.4 (http:// tree. bio. ed. ac. uk/ softw are/ figtr ee). Principal coordinate analysis (PCoA) was performed with GenAlEx 6.5 to evaluate the genetic relationships between populations. The Excel microsatellite toolkit 96 was used to measure polymorphism.

Data availability
The datasets generated and analysed during the current study are available in the National Center for Biotechnology Information (NCBI) Sequence-Read Archive (SRA) database with the accession number of PRJNA926661.